Detecting Camouflaged Content Polluters
نویسندگان
چکیده
The connectivity and openness of the Internet have cultivated a blistering expansion of online media websites. However, the culture of openness also makes the emerging platforms an effective channel for content pollution, such as fraud, phishing, and other online abuses. To complicate the problem, content polluters actively manipulate the characteristics of the Internet through establishing links with normal users and blending the malicious information with legitimate content. The manipulated links and content, being used as camouflage, make it very intricate to detect content polluters. Recent work has investigated camouflaged fraud in networks. However, due to the lack of availability of label information for camouflaged content, it is challenging to detect content polluters with traditional approaches. In this paper, we make the first attempt on detecting camouflaged content polluters. In order to evaluate the proposed approach, we conduct experiments on real-world data. The results show that our method achieves better results than existing approaches. Introduction Motivated by the monetary rewards, content polluters, which include fraudsters, scammers, and spammers, unfairly overpower normal users by spreading disinformation (Wu et al. 2016), which undermines the role of Internet media in sustaining a society as a collective entity. An emerging characteristic that further complicates the problem is the camouflage. Due to the openness of Internet media, it is easy for content polluters to copy a significant portion of content from normal users. The polluting content that is camouflaged by the legitimate messages can be very deluding due to the cognitive inertia: once many genuine posts from a fraudster establish trust, the fraudulent post is likely to convince many of the readers. Recent studies have investigated the camouflage of fraudsters from the perspective of network structures (Hooi et al. 2016; Wu et al. 2017), proving that network camouflage could be efficiently detected through studying the abnormality of the density of a graph caused by the camouflage links. In this work, we focus on precisely the other side of the problem, i.e., detecting content polluters in the presence of camouflage. In order to illustrate the problem, we show a Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. toy example in Figure 1, where a normal user posts A, B, and C, and the adversarial rival copies them to camouflage the polluting post D. Our goal is to detect content polluters in the presence of camouflage. It is particularly difficult and challenging to detect camouflaged content polluters. Due to the massive amount of content information on Internet media, there is a lack of availability of label information for camouflaged posts. Another challenge is data scarcity. Since camouflage can take up the majority of content from a content polluter, it is not easy to identify the scarce polluting evidence, and manually labeling it could be labor-intensive. In order to tackle the challenges, we propose to utilize label information of accounts. Account labels are easier to obtain and publicly available at a relatively large scale on various platforms. Motivated by results of recent studies that camouflage tends to be random while malicious content is alike due to the similar fraudulent targets (Hooi et al. 2016), we assume that the intersection of content polluters’ posts in the feature space is more likely to be a signal of polluting content. Hence, we aim to investigate how Camouflaged Content Polluters can be detected with Discriminant Analysis. In particular, we introduce a novel method CCPDA, which effectively detects content polluters by mining signals of camouflaged pollution. Major contributions of this work are summarized below, • Formally define the problem of detecting camouflaged content polluters; • Propose a novel method CCPDA to efficiently detect camouflaged content polluters; and • Conduct extensive experiments to evaluate the effectiveness and efficiency of CCPDA.
منابع مشابه
Real-time Detection of Content Polluters in Partially Observable Twitter Networks
Content polluters, or bots that hijack a conversation for political or advertising purposes are a known problem for event prediction, election forecasting and when distinguishing real news from fake news in social media data. Identifying this type of bot is particularly challenging, with state-of-the-art methods utilising large volumes of network data as features for machine learning models. Su...
متن کاملSeven Months with the Devils: A Long-Term Study of Content Polluters on Twitter
The rise in popularity of social networking sites such as Twitter and Facebook has been paralleled by the rise of unwanted, disruptive entities on these networks—including spammers, malware disseminators, and other content polluters. Inspired by sociologists working to ensure the success of commons and criminologists focused on deterring vandalism and preventing crime, we present the first long...
متن کاملDevils, Angels, and Robots: Tempting Destructive Users in Social Media
Social media sites derive their value by providing a popular and dependable community for participants to engage, share, and interact. This community value and related services like search and advertising are threatened by spammers, content polluters, and malware disseminators. In an effort to preserve community value and ensure long-term success, we present a prototype system for automatically...
متن کاملBreaking camouflage and detecting targets require optic flow and image structure information.
Use of motion to break camouflage extends back to the Cambrian [In the Blink of an Eye: How Vision Sparked the Big Bang of Evolution (New York Basic Books, 2003)]. We investigated the ability to break camouflage and continue to see camouflaged targets after motion stops. This is crucial for the survival of hunting predators. With camouflage, visual targets and distracters cannot be distinguishe...
متن کاملToo Much R&D Although Polluters Underestimate Environmental Harm?
This paper shows that polluters who underestimate environmental harm might invest excessively in promoting technological change in pollution control under environmental liability law. Neither strict liability nor negligence can prevent such a distortion. However, we define a second best due care standard under negligence that induces a welfare superior R&D equilibrium. If, on the other hand, po...
متن کامل